-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SE-3520] Fixes Transcripts Incompletely Uploaded to S3 Bucket #266
[SE-3520] Fixes Transcripts Incompletely Uploaded to S3 Bucket #266
Conversation
Thanks for the pull request, @nizarmah! I've created OSPR-5084 to keep track of it in JIRA, where we prioritize reviews. Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
efa8cc4
to
17dae12
Compare
@nizarmah Thank you for your contribution. Please let me know once it is ready for our review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a relevant unit test. Thanks
@natabene I was going to say that this is ready for your review. Thanks for moving along with it, I really appreciate it! @DawoudSheraz I'll definitely add the relevant unit tests 🙂 I just have a small question that's kind of separate from the unit tests. Would appreciate any help regarding this, @natabene @DawoudSheraz: I identified the code, and fixed it so that it does overwrite existing transcripts. Now the questions I had were, should I open a new PR for that? Or would you prefer I include it in this PR? Or is it something that edX intentionally did like that and isn't interested in making it possible to overwrite transcripts? |
@nizarmah Hi. Thanks for the follow-up. Regarding your question about not supporting overwriting, that code addition was intentional. If a video already exists on an instance, re-uploading is merely creating and uploading the duplicate of an already existing file. There can be instances where the transcripts are edited manually in the export but that ideally should be uploaded from Studio/Video block to keep the data consistent. |
@DawoudSheraz ah that makes it a little bit more difficult. Thanks for the extremely quick reply btw, I really appreciate it! So I guess this makes me curious, would you consider upstreaming such a change if we compare the transcript data that is being uploaded with the already existing one? Maybe we can add a hidden content hash for each video transcript and use it to verify if the data changed or not? But if you would prefer not to upstream such a change, I totally understand. I'm just asking so that I know that I tried my best to upstream such a change 😅 hahaha. |
@nizarmah It can be done in a follow-up OSPR but an internal evaluation will be needed if the change is even needed based on the use cases. Thanks |
3610c7c
to
047288d
Compare
Alright @DawoudSheraz I added a unit test to make sure that the So, to validate that works, I did the following change in # Create transcript record.
create_video_transcript(
video_id=edx_video_id,
language_code=language_code,
file_format=file_format,
- content=ContentFile(file_content.encode('utf-8')),
+ content=ContentFile(file_content),
provider=provider
) This resulted in the following error: > content_encoding = chardet.detect(transcript_content.read())['encoding']
edxval/tests/test_api.py:1934:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
byte_str = 'Hello, edX greets you. random utf-8 characters: éâô'
def detect(byte_str):
"""
Detect the encoding of the given byte string.
:param byte_str: The byte sequence to examine.
:type byte_str: ``bytes`` or ``bytearray``
"""
if not isinstance(byte_str, bytearray):
if not isinstance(byte_str, bytes):
raise TypeError('Expected object of type bytes or bytearray, got: '
> '{0}'.format(type(byte_str)))
E TypeError: Expected object of type bytes or bytearray, got: <class 'str'>
.tox/py35-django22/lib/python3.5/site-packages/chardet/__init__.py:34: TypeError Let me know if you'd like me to add anything else 🙂 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the version in setup.py before merging
That's been done @DawoudSheraz 👍 Thanks a lot for your review 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 🎉
- I tested this: Checked the code working in our dev server
- I read through the code
- I checked for accessibility issues NA
- Includes documentation NA
- I made sure any change in configuration variables is reflected in the corresponding client's
configuration-secure
repository. NA
@natabene @DawoudSheraz is there anything blocking this PR from getting merged? Please let me know 👍 |
@DawoudSheraz If you think this is good to be merged, please merge - community authors don't have permissions to do so. |
@nizarmah 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
@nizarmah I have merged and created a release tag https://github.com/edx/edx-val/releases/tag/1.4.3. The changes will soon be part of the platform after a requirements update. Thanks for your contribution. |
Thanks a lot! 😄 |
There's an issue that has been happening when the video transcripts are being uploaded to S3. The video transcripts that get added through Import Course don't end up getting uploaded to the S3 because of the following error:
Accordingly, this seemed like an existing issue with
boto
, which requires specifying the file encoding.More context on the issue can be found in the linked discussion, below.
JIRA tickets: OSPR-5084, SE-3520
Discussions: boto/boto#2868
Installation instructions:
edxapp
user, bysudo -Hu edxapp bash
.cms
or all services.Testing instructions:
edx-val
by following the Installation Instructions.Author notes & concerns:
Reviewers